muHVT: An Introduction

Zubin Dowlaty, Shubhra Prakash, Sangeet Moy Das, Praditi Shah, Shantanu Vaidya, Somya Shambhawi

2023-06-22

1 Abstract

The muHVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:

  1. Data Compression: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.

  2. Data Projection: Dimension projection of the compressed cells to 1D,2D or 3D with the Sammons Non-linear Algorithm. This step creates topology preserving map (also called as embedding) coordinates into the desired output dimension.

  3. Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map useful for semi-supervised tasks.

  4. Prediction: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.

2 Data Compression

Compression is a technique used to reduce the data size while preserving its essential information, allowing for efficient storage and decompression to reconstruct the original data. While Vector quantization (VQ) is a technique used in data compression to represent a set of data points with a smaller number of representative vectors. It achieves compression by exploiting redundancies or patterns in the data and replacing similar data points with representative vectors.

This package offers several advantages for performing data compression as it is designed to handle high-dimensional data more efficiently. It provides a hierarchical compression approach, allowing multi-resolution representation of the data. The hierarchical structure enables efficient compression and storage of the data while preserving different levels of detail. HVT aims to preserve the topological structure of the data during compression.Spatial data with irregular shapes and complex structures in high-dimensional data can contain valuable information about relationships and patterns. HVT seeks to capture and retain these topological characteristics, enabling meaningful analysis and visualization.This package employs tessellation to divide the compressed data space into distinct cells or regions while preserving the topology of the original data. This means that the relationships and connectivity between data points are maintained in the compressed representation.

This package can perform vector quantization using the following algorithms-

2.1 Hierarchical Vector Quantization

2.1.1 Using k-means

  1. The k-means algorithm randomly selects k data points as initial means.
  2. k clusters are formed by assigning each data point to its closest cluster mean using the Euclidean distance.
  3. Virtual means for each cluster are calculated by using all datapoints contained in a cluster.

The second and third steps are iterated until a predefined number of iterations is reached or the clusters converge. The runtime for the algorithm is O(n).

2.1.2 Using k-medoids

  1. The k-medoids algorithm randomly selects k data points as initial means out of the n data points as the medoids.
  2. k clusters are formed by assigning each data point to its closest medoid by using any common distance metric methods.
  3. Virtual means for each cluster are calculated by using all datapoints contained in a cluster.

The second and third steps are iterated until a predefined number of iterations is reached or the clusters converge. The runtime for the algorithm is O(k * (n-k)^2).

These algorithm divides the dataset recursively into cells using \(k-means\) or \(k-medoids\) algorithm. The maximum number of subsets are decided by setting \(n_cells\) to, say five, in order to divide the dataset into maximum of five subsets. These five subsets are further divided into five subsets(or less), resulting in a total of twenty five (5*5) subsets. The recursion terminates when the cells either contain less than three data point or a stop criterion is reached. In this case, the stop criterion is set to when the cell error exceeds the quantization threshold.

The steps for this method are as follows:

  1. Select k(number of cells), depth and quantization error threshold.
  2. Perform quantization (using \(k-means\) or \(k-medoids\)) on the input dataset.
  3. Calculate quantization error for each of the k cells.
  4. Compare the quantization error for each cell to quantization error threshold.
  5. Repeat steps 2 to 4 for each of the k cells whose quantization error is above threshold until stop criterion is reached.

The stop criterion is when the quantization error of a cell satisfies one of the below conditions:

  • reaches below quantization error threshold.
  • there are less than three data points in the cell.
  • the user specified depth has been attained.

The quantization error for a cell is defined as follows:

\[QE = \max_i(||A-F_i||_{p})\]

where

  • \(A\) is the centroid of the cell
  • \(F_i\) represents a data point in the cell
  • \(m\) is the number of points in the cell
  • \(p\) is the \(p\)-norm metric. Here \(p\) = 1 represents L1 Norm and \(p\) = 2 represents L2 Norm

2.1.3 Quantization Error

Let us try to understand quantization error with an example.

Figure 1: The Voronoi tessellation for level 1 shown for the 5 cells with the points overlayed

Figure 1: The Voronoi tessellation for level 1 shown for the 5 cells with the points overlayed

An example of a 2 dimensional VQ is shown above.

In the above image, we can see 5 cells with each cell containing a certain number of points. The centroid for each cell is shown in blue. These centroids are also known as codewords since they represent all the points in that cell. The set of all codewords is called a codebook.

Now we want to calculate quantization error for each cell. For the sake of simplicity, let’s consider only one cell having centroid A and m data points \(F_i\) for calculating quantization error.

For each point, we calculate the distance between the point and the centroid.

\[ d = ||A - F_i||_{p} \]

In the above equation, p = 1 means L1_Norm distance whereas p = 2 means L2_Norm distance. In the package, the L1_Norm distance is chosen by default. The user can pass either L1_Norm, L2_Norm or a custom function to calculate the distance between two points in n dimensions.

\[QE = \max_i(||A-F_i||_{p})\]

Now, we take the maximum calculated distance of all m points. This gives us the furthest distance of a point in the cell from the centroid, which we refer to as Quantization Error. If the Quantization Error is higher than the given threshold, the centroid/ codevector is not a good representation for the points in the cell. Now we can perform further Vector Quantization on these points and repeat the above steps.

Please note that the user can select mean, max or any custom function to calculate the Quantization Error. The custom function takes a vector of m value (where each value is a distance between point in n dimensions and centroids) and returns a single value which is the Quantization Error for the cell.

If we select mean as the error metric, the above Quantization Error equation will look like this:

\[QE = \frac{1}{m}\sum_{i=1}^m||A-F_i||_{p}\]

3 Data Projection

Projection mainly involves converting data from its original form to a different space or coordinate system while preserving certain properties of it. By projecting data into a common coordinate system, spatial relationships, distances, areas, and other spatial attributes can be accurately measured and compared.

HVT performs projection as part of its workflow to visualize and explore high-dimensional data. The projection step in HVT involves mapping the compressed data, represented by the hierarchical structure of cells, onto a lower-dimensional space for visualization purposes, as human perception is more suited to interpreting information in lower-dimensional spaces.Users can zoom in/out, rotate, and explore different regions of the projected space to gain insights and understand the data from different perspectives.

Sammon’s projection is an algorithm used in this package to map a high-dimensional space to a space of lower dimensionality while attempting to preserve the structure of inter-point distances in the projection. It is particularly suited for use in exploratory data analysis and is usually considered a non-linear approach since the mapping cannot be represented as a linear combination of the original variables. The centroids are plotted in 2D after performing Sammon’s projection at every level of the tessellation.

Denoting the distance between \(i^{th}\) and \(j^{th}\) objects in the original space by \(d_{ij}^*\), and the distance between their projections by \(d_{ij}\). Sammon’s mapping aims to minimize the below error function, which is often referred to as Sammon’s stress or Sammon’s error.

\[E=\frac{1}{\sum_{i<j} d_{ij}^*}\sum_{i<j}\frac{(d_{ij}^*-d_{ij})^2}{d_{ij}^*}\]

The minimization of this can be performed either by gradient descent, as proposed initially, or by other means, usually involving iterative methods. The number of iterations need to be experimentally determined and convergent solutions are not always guaranteed. Many implementations prefer to use the first Principal Components as a starting configuration.

4 Tessellation

A Voronoi diagram is a way of dividing space into a number of regions. A set of points (called seeds, sites, or generators) is specified beforehand and for each seed, there will be a corresponding region consisting of all points within proximity of that seed. These regions are called Voronoi cells. It is complementary to Delaunay triangulation is a geometrical algorithm used to create a triangulated mesh from a set of points in a plane which has the property that no data point lies within the circumcircle of any triangle in the triangulation. This property guarantees that the resulting cells in the tessellation do not overlap with each other.

By using Delaunay triangulation, HVT can achieve a partitioning of the data space into distinct and non-overlapping regions, which is crucial for accurately representing and analyzing the compressed data.Additionally, the use of Delaunay triangulation for tessellation ensures that the resulting cells have well-defined shapes, typically triangles in two dimensions or tetrahedra in three dimensions.

The hierarchical structure resulting from tessellation preserves the inherent structure and relationships within the data. It captures clusters, subclusters, and other patterns in the data, allowing for a more organized and interpretable representation. The hierarchical structure reduces redundancy and enables more compact representations.

Tessellate: Constructing Voronoi Tesselation

In this package, we use sammons from the package MASS to project higher dimensional data to a 2D space. The function hvq called from the HVT function returns hierarchical quantized data which will be the input for construction of the tessellations. The data is then represented in 2D coordinates and the tessellations are plotted using these coordinates as centroids. We use the package deldir for this purpose. The deldir package computes the Delaunay triangulation (and hence the Dirichlet or Voronoi tessellation) of a planar point set according to the second (iterative) algorithm of Lee and Schacter. For subsequent levels, transformation is performed on the 2D coordinates to get all the points within its parent tile. Tessellations are plotted using these transformed points as centroids. The lines in the tessellations are chopped in places so that they do not protrude outside the parent polygon. This is done for all the subsequent levels.

5 Prediction

Prediction basically refers to the process of making predictions or estimating future values or outcomes based on existing data patterns.In data prediction, a model is developed based on historical data or a training dataset, and this model is then used to make predictions on new, unseen data. The model captures the underlying patterns, trends, and relationships present in the training data, allowing it to make informed predictions on similar or related data points.

In this package, we use predictHVT function to predict each point in the test dataset.

Prediction Algorithm

The prediction algorithm recursively calculates the distance between each point in the test dataset and the cell centroids for each level. The following steps explain the prediction method for a single point in the test dataset:

  1. Calculate the distance between the point and the centroid of all the cells in the first level.
  2. Find the cell whose centroid has minimum distance to the point.
  3. Check if the cell drills down further to form more cells.
  4. If it doesn’t, return the path. Or else repeat steps 1 to 4 till we reach a level at which the cell doesn’t drill down further.

6 Example I: muHVT with the Torus dataset

In this section, we will see how we can use the package to visualize multidimensional data by projecting them to two dimensions using Sammon’s projection and further used for scoring

Data Understanding

First of all, let us see how to generate data for torus. We are using a library geozoo for this purpose. Geo Zoo (stands for Geometric Zoo) is a compilation of geometric objects ranging from three to 10 dimensions. Geo Zoo contains regular or well-known objects, eg cube and sphere, and some abstract objects, e.g. Boy’s surface, Torus and Hyper-Torus.

Here, we will generate a 3D torus (a torus is a surface of revolution generated by revolving a circle in three-dimensional space one full revolution about an axis that is coplanar with the circle) with 9000 points.

Raw Torus Dataset

The torus dataset includes the following columns:

Lets, explore the raw torus dataset containing 12000 points. For the shake of brevity we are displaying first 6 rows.

set.seed(240)
# Here p represents dimension of object
# n represents number of points
torus <- geozoo::torus(p = 3,n = 12000)
torus_df <- data.frame(torus$points)
colnames(torus_df) <- c("x","y","z")

torus_df1 <- torus_df %>% round(4)
colnames(torus_df1) <- c("x","y","z")
torus_df1$Row.No <- as.numeric(row.names(torus_df))
torus_df1 <- torus_df1 %>% dplyr::select(Row.No,x,y,z)
Table(head(torus_df1))
Row.No x y z
1 -2.6282 0.5656 -0.7253
2 -1.4179 -0.8903 0.9455
3 -1.0308 1.1066 -0.8731
4 1.8847 0.1895 0.9944
5 -1.9506 -2.2507 0.2071
6 -1.4824 0.9229 0.9672

We will first split the torus data into train and test. We will randomly select 9000 data points as training and remaining 3000 data points as testing data.

set.seed(42)
train_indices <- sample(1:nrow(torus_df), 9000)
trainTorus <- torus_df[train_indices, ]
trainTorus_data <- trainTorus %>% round(4)
test_indices <- setdiff(1:nrow(torus_df), train_indices)
testTorus <- torus_df[test_indices, ]

Raw Training Dataset

First of all, we will see the randomly selected training data containing (9000 data points). For the shake of brevity we are displaying first six rows.


trainTorus_data$Row.No <- as.numeric(row.names(trainTorus_data))
trainTorus_data <- trainTorus_data %>% dplyr::select(Row.No,x,y,z)
row.names(trainTorus_data) <- NULL
Table(head(trainTorus_data))
Row.No x y z
10801 -0.6864 -0.8709 0.4537
2369 0.0470 -1.4714 0.8493
5273 1.4155 0.0936 0.8136
9290 0.2448 1.1402 -0.5520
1252 -2.0865 0.0771 0.9961
8826 2.9131 -0.0627 -0.4061

Now let’s have a look at structure and summary of the training data.

str(trainTorus_data)
#> 'data.frame':    9000 obs. of  4 variables:
#>  $ Row.No: num  10801 2369 5273 9290 1252 ...
#>  $ x     : num  -0.686 0.047 1.415 0.245 -2.087 ...
#>  $ y     : num  -0.8709 -1.4714 0.0936 1.1402 0.0771 ...
#>  $ z     : num  0.454 0.849 0.814 -0.552 0.996 ...
summary(trainTorus_data)
#>      Row.No            x                   y                   z            
#>  Min.   :    1   Min.   :-2.997700   Min.   :-2.995600   Min.   :-1.000000  
#>  1st Qu.: 2988   1st Qu.:-1.151025   1st Qu.:-1.118100   1st Qu.:-0.716225  
#>  Median : 5986   Median : 0.022200   Median :-0.000600   Median : 0.016950  
#>  Mean   : 5988   Mean   :-0.002215   Mean   : 0.002805   Mean   : 0.004401  
#>  3rd Qu.: 8974   3rd Qu.: 1.140325   3rd Qu.: 1.125900   3rd Qu.: 0.719875  
#>  Max.   :12000   Max.   : 2.998100   Max.   : 2.999300   Max.   : 1.000000

Raw Testing Dataset

Now, lets have a look at randomly selected testing dataset containing(3000 data points).For the shake of brevity we are displaying first six rows.

test_dataset <- testTorus
test_dataset1 <- round(test_dataset,4)
test_dataset1$Row.No <- row.names(test_dataset)
test_dataset1 <- test_dataset1 %>% dplyr::select(Row.No,x,y,z) 
rownames(test_dataset1) <- NULL
Table(head(test_dataset1))
Row.No x y z
6 -1.4824 0.9229 0.9672
10 0.7920 -1.3482 -0.8998
12 -2.3787 1.7986 -0.1878
17 -0.8428 -0.5436 0.0755
20 -2.6487 -0.5745 0.7040
23 -1.1130 -0.6516 -0.7040

Now let’s have a look at structure and summary of the test data.

str(test_dataset)
#> 'data.frame':    3000 obs. of  3 variables:
#>  $ x: num  -1.482 0.792 -2.379 -0.843 -2.649 ...
#>  $ y: num  0.923 -1.348 1.799 -0.544 -0.574 ...
#>  $ z: num  0.9672 -0.8998 -0.1878 0.0755 0.704 ...
summary(test_dataset)
#>        x                    y                  z            
#>  Min.   :-2.9976672   Min.   :-2.99934   Min.   :-1.000000  
#>  1st Qu.:-1.1408711   1st Qu.:-1.09877   1st Qu.:-0.700378  
#>  Median :-0.0670732   Median : 0.06562   Median : 0.012098  
#>  Mean   : 0.0008702   Mean   : 0.03297   Mean   : 0.004486  
#>  3rd Qu.: 1.1404037   3rd Qu.: 1.14810   3rd Qu.: 0.713435  
#>  Max.   : 2.9995467   Max.   : 2.98818   Max.   : 0.999999

Now let’s try to visualize the torus (donut) in 3D Space.


knitr::include_graphics('torus_donut.png')
Figure 2: 3D Torus

Figure 2: 3D Torus

Note: The steps of compression, projection, and tessellation are iteratively performed until a minimum compression rate of 80% is achieved. Once the desired compression is attained, the resulting model object is used for scoring using the predictHVT() function

In this section all the outlined workflow steps provided in the abstract section (Compression, Projection, Tessellation and Prediction) are executed at level 1.

6.1 Step 1: Data Compression

The core function for compression in the workflow is HVQ, which is called within the HVT function. we have a parameter called quantization error. This parameter acts as a threshold and determines the number of levels in the hierarchy. It means that, if there are ‘n’ number of levels in the hierarchy, then all the clusters formed till this level will have quantization error equal or greater than the threshold quantization error. The user can define the number of clusters in the first level of hierarchy and then each cluster in first level is sub-divided into the same number of clusters as there are in the first level. This process continues and each group is divided into smaller clusters as long as thethreshold quantization error is met. The output of this technique will be hierarchically arranged vector quantized data.

However, let’s try to comprehend the HVT function first before moving on.

HVT(
  dataset,
  min_compression_perc,
  n_cells,
  depth,
  quant.err,
  projection.scale,
  normalize = T,
  distance_metric = c("L1_Norm", "L2_Norm"),
  error_metric = c("mean", "max"),
  quant_method = c("kmeans", "kmedoids"),
  diagnose = TRUE,
  hvt_validation = FALSE,
  train_validation_split_ratio = 0.8
)

Each of the parameters of HVT function have been explained below:

We will use the HVT function to compress our data while preserving essential features of the dataset. Our goal is to achieve data compression upto atleast 80%. In situations where the compression ratio does not meet the desired target, we can explore adjusting the model parameters as a potential solution. This involves making modifications to parameters such as the quantization error threshold or increasing the number of cells and then rerunning the HVT function again.

In our example we will iteratively increase the number of cells until the desired compression percentage is reached instead of increasing the quantization threshold because it may reduce the level of detail captured in the data representation

Iteration 1:

We will pass the below mentioned model parameters along with torus dataset to HVT function.

Model Parameters

set.seed(240)
hvt.torus <- muHVT::HVT(
  torus_df,
  n_cells = 100,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = F,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans"
)

Let’s checkout the compression summary.

compressionSummaryTable(hvt.torus[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 100 0 0 n_cells: 100 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

As it can be seen from the table above, none of the 100 cells have reached the quantization threshold error. Therefore we can further subdivide the cells by increasing the n_cells parameters and then see if desired compression (80%) is reached

Iteration 2:

Let’s retry by increasing the n_cells parameter to 300.

Model Parameters

set.seed(240)
hvt.torus2 <- muHVT::HVT(
  torus_df,
  n_cells = 300,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = F,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans"
)

Let’s checkout the compression summary again.

compressionSummaryTable(hvt.torus2[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 300 5 0.02 n_cells: 300 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

It can be observed from the table above that only 5 cells out of 300 i.e. 2% of the cells reached the Quantization Error threshold. Therefore we can further subdivide the cells by increasing the n_cells parameters and then see if 80% compression is reached

Iteration 3:

Since we are yet to achieve the compression of atleast 80%, lets try again by increasing the n_cells parameter to 900.

Model Parameters

set.seed(240)
hvt.torus3 <- muHVT::HVT(
  torus_df,
  n_cells = 900,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = F,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans"
)

Let’s check the compression summary for torus.

compressionSummaryTable(hvt.torus3[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 900 768 0.85 n_cells: 900 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

By increasing the number of cells to 900, we were successfully able to compress 85% of the data, so we will not further subdivide the cells

We successfully compressed 85% of the data using n_cells parameter as 900, the next step involves performing data projection on the compressed data. In this step, the compressed data will be transformed and projected onto a lower-dimensional space to visualize and analyze the data in a more manageable form.

6.2 Step 2: Data Projection

The function sammonsProjection() utilizes the sammons function from the MASS package being called in HVT. Sammon’s projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality while attempting to preserve the structure of inter-point distances in the projection.The centroids are plotted in 2D after performing Sammon’s projection at every level of the tessellation.

Iteration 1

lets view the projected 2D coordinates after performing sammon’s projection on the compressed data for the first iteration where we set n_cells parameter as 100. For the shake of brevity we are displaying first six rows.


hvt_torus_coordinates <-hvt.torus[[2]][[1]][["1"]]
centroids <<- list()
  coordinates_value <- lapply(1:length(hvt_torus_coordinates), function(x){
    centroids <-hvt_torus_coordinates[[x]]
    coordinates <- centroids$pt
  })
centroid_coordinates<<- do.call(rbind.data.frame, coordinates_value)  
colnames(centroid_coordinates) <- c("x_coord","y_coord")
centroid_coordinates$Row.No <- as.numeric(row.names(centroid_coordinates)) 
centroid_coordinates <- centroid_coordinates %>% dplyr::select(Row.No,x_coord,y_coord)
centroid_coordinates1 <- centroid_coordinates %>% data.frame() %>% round(4)
Table(head(centroid_coordinates1), scroll = T, limit = 20)
Row.No x_coord y_coord
1 15.4686 9.1562
2 -12.3060 -3.5491
3 -6.9791 19.6759
4 9.5694 -0.5423
5 24.8946 17.7822
6 24.0559 6.8543

Lets see the projected Sammons 2D onto a plane with n_cell set to 100 in first iteration.

ggplot(centroid_coordinates1, aes(x_coord, y_coord)) +
  geom_point(color = "blue") +
  labs(x = "X", y = "Y")
Figure 3: Sammons 2D Plot for 100 cells

Figure 3: Sammons 2D Plot for 100 cells

Iteration 2

lets view the projected 2D coordinates after performing sammon’s projection on the compressed data for the Second iteration where we set n_cells parameter as 300. For the shake of brevity we are displaying first six rows.


hvt_torus_coordinates <-hvt.torus2[[2]][[1]][["1"]]
centroids <<- list()
  coordinates_value <- lapply(1:length(hvt_torus_coordinates), function(x){
    centroids <-hvt_torus_coordinates[[x]]
    coordinates <- centroids$pt
  })
centroid_coordinates<<- do.call(rbind.data.frame, coordinates_value)  
colnames(centroid_coordinates) <- c("x_coord","y_coord")
centroid_coordinates$Row.No <- as.numeric(row.names(centroid_coordinates)) 
centroid_coordinates <- centroid_coordinates %>% dplyr::select(Row.No,x_coord,y_coord)
centroid_coordinates2 <- centroid_coordinates %>% data.frame() %>% round(4)
Table(head(centroid_coordinates2), scroll = T, limit = 20)
Row.No x_coord y_coord
1 23.7284 5.0557
2 -11.2747 1.3672
3 11.2157 26.5876
4 8.5268 -3.7218
5 30.3534 5.0864
6 29.4938 -0.6784

Lets see the projected Sammons 2D onto a plane with n_cell set to 300 in second iteration.

ggplot(centroid_coordinates2, aes(x_coord, y_coord)) +
  geom_point(color = "blue") +
  labs(x = "X", y = "Y")
Figure 4: Sammons 2D Plot for 300 cells

Figure 4: Sammons 2D Plot for 300 cells

Iteration 3

lets view the projected 2D coordinates after performing sammon’s projection on the compressed data for the third iteration where we set n_cells parameter as 900. For the shake of brevity we are displaying first six rows.


hvt_torus_coordinates <-hvt.torus3[[2]][[1]][["1"]]
centroids <<- list()
  coordinates_value <- lapply(1:length(hvt_torus_coordinates), function(x){
    centroids <-hvt_torus_coordinates[[x]]
    coordinates <- centroids$pt
  })
centroid_coordinates<<- do.call(rbind.data.frame, coordinates_value)  
colnames(centroid_coordinates) <- c("x_coord","y_coord")
centroid_coordinates$Row.No <- as.numeric(row.names(centroid_coordinates)) 
centroid_coordinates <- centroid_coordinates %>% dplyr::select(Row.No,x_coord,y_coord)
centroid_coordinates3 <- centroid_coordinates %>% data.frame() %>% round(4)
Table(head(centroid_coordinates3), scroll = T, limit = 20)
Row.No x_coord y_coord
1 19.2964 -18.4704
2 -5.9543 10.4406
3 25.5603 0.6926
4 1.5064 -9.0975
5 18.3666 -24.9166
6 17.3898 -22.7207

Lets see the projected Sammons 2D onto a plane with n_cell set to 900 in third iteration.


ggplot(centroid_coordinates3, aes(x_coord, y_coord)) +
  geom_point(color = "blue") +
  labs(x = "X", y = "Y")
Figure 5: Sammons 2D Plot for 900 cells

Figure 5: Sammons 2D Plot for 900 cells

6.3 Step 3: Tessellation

The deldir package computes the Delaunay triangulation (and hence the Dirichlet or Voronoi tessellation) of a planar point set according to the second (iterative) algorithm of Lee and Schacter. For subsequent levels, transformation is performed on the 2D coordinates to get all the points within its parent tile. Tessellations are plotted using these transformed points as centroids.plotHVT is the main function to plot hierarchical voronoi tessellation.

Now let’s try to understand plotHVT function. The parameters have been explained in detail below:

plotHVT(hvt.results, line.width, color.vec, pch1 = 21, centroid.size = 3, title = NULL, maxDepth = 1)

Iteration 1

To enhance visualization, let’s generate a plot of the Voronoi tessellation for the first iteration where we set n_cells parameter as 100. This plot will provide a visual representation of the Voronoi regions corresponding to the data points, aiding in the analysis and understanding of the data distribution.

muHVT::plotHVT(
  hvt.torus,
  line.width = c(0.4),
  color.vec = c("#141B41"),
  centroid.size = 0.6,
  maxDepth = 1
)
Figure 6: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’torus’

Figure 6: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’torus’

Iteration 2

Now, let’s plot the Voronoi tessellation for the second iteration where we set n_cells parameter to 300.

muHVT::plotHVT(
  hvt.torus2,
  line.width = c(0.4),
  color.vec = c("#141B41"),
  centroid.size = 0.6,
  maxDepth = 1
)
Figure 7: The Voronoi tessellation for layer 1 shown for the 300 cells in the dataset ’torus’

Figure 7: The Voronoi tessellation for layer 1 shown for the 300 cells in the dataset ’torus’

Iteration 3

Now, let’s plot the Voronoi tessellation again, for the third iteration where we set n_cells parameter to 900.

muHVT::plotHVT(
  hvt.torus3,
  line.width = c(0.4),
  color.vec = c("#141B41"),
  centroid.size = 0.6,
  maxDepth = 1
)
Figure 8: The Voronoi tessellation for layer 1 shown for the 900 cells in the dataset ’torus’

Figure 8: The Voronoi tessellation for layer 1 shown for the 900 cells in the dataset ’torus’

From the presented plot, the inherent structure of the donut can be easily observed in the two-dimensional space

We will now overlay all the features as heatmap over the Voronoi Tessellation plot for better visualization and identification of patterns, trends, and variations in the data.

Heat Maps

Let’s have look at the hvtHmap function which we will use to overlay a variable as heatmap.

hvtHmap(hvt.results, dataset, child.level, hmap.cols, color.vec ,line.width, palette.color = 6)

Now let’s plot the Voronoi Tessellation with the heatmap overlaid for all the features in the torus data for better visualization and interpretation of data patterns and distributions.

The heatmaps displayed below provides a visual representation of the spatial characteristics of the torus, allowing us to observe patterns and trends in the distribution of each of the features (n,X,Y and Z). The sheer green shades highlight regions with higher coordinate values in each of the heatmaps, while the indigo shades indicate areas with the lowest coordinate values in each of the heatmaps. By analyzing these heatmaps, we can gain insights into the variations and relationships between each of these features within the torus structure.

muHVT::hvtHmap(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "n",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.8,
  show.points = T,
  quant.error.hmap = 0.1,
  n_cells.hmap = 15
)
Figure 9: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for No. of entities in each cell in the ’torus’ dataset

Figure 9: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for No. of entities in each cell in the ’torus’ dataset

muHVT::hvtHmap(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "x",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.8,
  show.points = T,
  quant.error.hmap = 0.1,
  n_cells.hmap = 15
)
Figure 10: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable x in the ’torus’ dataset

Figure 10: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable x in the ’torus’ dataset

muHVT::hvtHmap(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "y",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.8,
  show.points = T,
  quant.error.hmap = 0.1,
  n_cells.hmap = 15
)
Figure 11: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable y in the ’torus’ dataset

Figure 11: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable y in the ’torus’ dataset

muHVT::hvtHmap(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "z",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.8,
  show.points = T,
  quant.error.hmap = 0.1,
  n_cells.hmap = 15
)
Figure 12: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable z in the ’torus’ dataset

Figure 12: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable z in the ’torus’ dataset

6.4 Step 4: Prediction(predictHVT)

Raw Testing Dataset

Lets have a look at out test dataset containing (3000 points) before we pass it to predictHVT function for scoring.

Table(head(test_dataset1))
Row.No x y z
6 -1.4824 0.9229 0.9672
10 0.7920 -1.3482 -0.8998
12 -2.3787 1.7986 -0.1878
17 -0.8428 -0.5436 0.0755
20 -2.6487 -0.5745 0.7040
23 -1.1130 -0.6516 -0.7040

However, let’s try to comprehend the predictHVT function first before moving on

predictHVT(data,
                  hvt.results,
                  hmap.cols = NULL,
                  child.level = 1,
                  ...)

The important parameters for the function predictHVT are as below:

Now once we have built the model, let us try to predict using our test dataset which cell and which level each point belongs to.

set.seed(240)
predictions_torus <- muHVT::predictHVT(
  testTorus,
  hvt.torus3,
  child.level = 1,
  line.width = c(1.2),
  color.vec = c("#141B41"),
  quant.error.hmap = 0.1,
  n_cells.hmap = 9000,
  normalize = F
)

Let’s see which cell and level each point belongs to and check the mean absolute difference. For the sake of brevity, we will only show the first 10 rows

data1 <- test_dataset
data1$Row.No <- row.names(test_dataset)
data1 <- data1 %>% dplyr::select(Row.No,x,y,z)
rownames(data1) <- NULL
colnames(data1) <- c("Row.No","x_act","y_act","z_act")
data2 <- predictions_torus[["scoredPredictedData"]]
data2 <- data2 %>% dplyr::select(Cell.ID,x,y,z)
colnames(data2) <- c("Cell.ID","x_pred","y_pred","z_pred")
combined <- cbind(data1,data2)
combined$diff <- rowMeans(abs(combined[, c("x_act", "y_act", "z_act")] - combined[, c("x_pred", "y_pred", "z_pred")]))
options(scipen = 999)
combined %>% head(100) %>% 
  as.data.frame() %>%
  Table(scroll = T, limit = 10)
Row.No x_act y_act z_act Cell.ID x_pred y_pred z_pred diff
6 -1.4823709 0.9228529 0.9672467 723 -1.4824 0.9229 0.9672 0.0000410
10 0.7920450 -1.3482111 -0.8997781 252 0.7920 -1.3482 -0.8998 0.0000260
12 -2.3787465 1.7986402 -0.1878163 900 -2.3787 1.7986 -0.1878 0.0000344
17 -0.8427718 -0.5435588 0.0755262 558 -0.8428 -0.5436 0.0755 0.0000318
20 -2.6486525 -0.5744624 0.7039659 837 -2.6487 -0.5745 0.7040 0.0000397
23 -1.1130327 -0.6516259 -0.7039507 628 -1.1130 -0.6516 -0.7040 0.0000360
28 0.7520403 -2.6043863 0.7034024 140 0.7520 -2.6044 0.7034 0.0000188
30 -1.6755282 2.3358857 0.4847097 859 -1.6755 2.3359 0.4847 0.0000174
33 -1.6466922 0.4011827 -0.9523068 719 -1.6467 0.4012 -0.9523 0.0000106
34 0.7930278 2.4427927 0.8228262 458 0.7930 2.4428 0.8228 0.0000204
hist(combined$diff, breaks = 20, col = "blue", main = "Mean Absolute Difference", xlab = "Difference")
Figure 13: Mean Absolute Difference

Figure 13: Mean Absolute Difference

7 Example II: muHVT with the Personal Computer dataset

Data Understanding

In this section, we will use the Prices of Personal Computers dataset. This dataset contains 6259 observations and 10 features. The dataset observes the price from 1993 to 1995 of 486 personal computers in the US. The variables are price, speed, ram, screen, cd, etc. The dataset can be downloaded from here.

In this example, we will compress this dataset by using hierarchical VQ via k-means and visualize the Voronoi Tessellation plots using Sammons projection. Later on, we will overlay all the variables as a heatmap to generate further insights.

Here, we load the data and store into a variable computers.

set.seed(240)
# Load data from csv files
computers <- read.csv("https://raw.githubusercontent.com/Mu-Sigma/muHVT/master/vignettes/sample_dataset/Computers.csv")

Raw Personal Computers Dataset

The Computers dataset includes the following columns:

Let’s explore the Personal Computers Dataset containing (6259 points). For the shake of brevity we are displaying first six rows.

# Quick peek
Table(head(computers), scroll = T, limit = 20)
X price speed hd ram screen cd multi premium ads trend
1 1499 25 80 4 14 no no yes 94 1
2 1795 33 85 2 14 no no yes 94 1
3 1595 25 170 4 15 no no yes 94 1
4 1849 25 170 8 14 no no no 94 1
5 3295 33 340 16 14 no no yes 94 1
6 3695 66 340 16 14 no no yes 94 1

Now, let us check the structure of the data and analyse its summary.

str(computers)
#> 'data.frame':    6259 obs. of  11 variables:
#>  $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ price  : int  1499 1795 1595 1849 3295 3695 1720 1995 2225 2575 ...
#>  $ speed  : int  25 33 25 25 33 66 25 50 50 50 ...
#>  $ hd     : int  80 85 170 170 340 340 170 85 210 210 ...
#>  $ ram    : int  4 2 4 8 16 16 4 2 8 4 ...
#>  $ screen : int  14 14 15 14 14 14 14 14 14 15 ...
#>  $ cd     : chr  "no" "no" "no" "no" ...
#>  $ multi  : chr  "no" "no" "no" "no" ...
#>  $ premium: chr  "yes" "yes" "yes" "no" ...
#>  $ ads    : int  94 94 94 94 94 94 94 94 94 94 ...
#>  $ trend  : int  1 1 1 1 1 1 1 1 1 1 ...
summary(computers)
#>        X            price          speed              hd        
#>  Min.   :   1   Min.   : 949   Min.   : 25.00   Min.   :  80.0  
#>  1st Qu.:1566   1st Qu.:1794   1st Qu.: 33.00   1st Qu.: 214.0  
#>  Median :3130   Median :2144   Median : 50.00   Median : 340.0  
#>  Mean   :3130   Mean   :2220   Mean   : 52.01   Mean   : 416.6  
#>  3rd Qu.:4694   3rd Qu.:2595   3rd Qu.: 66.00   3rd Qu.: 528.0  
#>  Max.   :6259   Max.   :5399   Max.   :100.00   Max.   :2100.0  
#>       ram             screen           cd               multi          
#>  Min.   : 2.000   Min.   :14.00   Length:6259        Length:6259       
#>  1st Qu.: 4.000   1st Qu.:14.00   Class :character   Class :character  
#>  Median : 8.000   Median :14.00   Mode  :character   Mode  :character  
#>  Mean   : 8.287   Mean   :14.61                                        
#>  3rd Qu.: 8.000   3rd Qu.:15.00                                        
#>  Max.   :32.000   Max.   :17.00                                        
#>    premium               ads            trend      
#>  Length:6259        Min.   : 39.0   Min.   : 1.00  
#>  Class :character   1st Qu.:162.5   1st Qu.:10.00  
#>  Mode  :character   Median :246.0   Median :16.00  
#>                     Mean   :221.3   Mean   :15.93  
#>                     3rd Qu.:275.0   3rd Qu.:21.50  
#>                     Max.   :339.0   Max.   :35.00

Let us first split the data into train and test. We will use 80% of the data as train and remaining as test.

noOfPoints <- dim(computers)[1]
trainLength <- as.integer(noOfPoints * 0.8)
trainComputers <- computers[1:trainLength,]
testComputers <- computers[(trainLength+1):noOfPoints,]

K-means is not suitable for factor variables as the sample space for factor variables is discrete. A Euclidean distance function on such a space isn’t really meaningful. Hence, we will delete the factor variables(X, cd, multi, premium, trend) in our dataset.

Here we keep the original trainComputers and testComputers as we will use the variables from this dataset to overlay as heatmap and generate some insights.

trainComputers <-
  trainComputers %>% dplyr::select(-c(X, cd, multi, premium, trend))
testComputers <-
  testComputers %>% dplyr::select(-c(X, cd, multi, premium, trend))

Raw Training Dataset

Now, lets have a look at the randomly selected raw training dataset containing (5007 data points). For the shake of brevity we are displaying first six rows.

trainComputers_data <- trainComputers %>% as.data.frame() %>% round(4)
trainComputers_data$Row.No <- as.numeric(row.names(trainComputers_data))
trainComputers_data <- trainComputers_data %>% dplyr::select(Row.No,price,speed,hd,ram,screen,ads)
Table(head(trainComputers_data))
Row.No price speed hd ram screen ads
1 1499 25 80 4 14 94
2 1795 33 85 2 14 94
3 1595 25 170 4 15 94
4 1849 25 170 8 14 94
5 3295 33 340 16 14 94
6 3695 66 340 16 14 94

Raw Testing Dataset

Now, lets have a look at the randomly selected raw testing dataset containing (1252 data points). For the shake of brevity we are displaying first six rows.

#testComputers <- scale(testComputers, center = scale_attr$`scaled:center`, scale = scale_attr$`scaled:scale`) 
testComputers_data <- testComputers %>% as.data.frame() %>% round(4)
testComputers_data$Row.No <- as.numeric(row.names(testComputers_data))
testComputers_data <- testComputers_data %>% dplyr::select(Row.No,price,speed,hd,ram,screen,ads)
rownames(testComputers_data) <- NULL
Table(head(testComputers_data))
Row.No price speed hd ram screen ads
5008 1540 33 214 4 15 191
5009 3094 50 1000 24 15 191
5010 1794 50 214 4 14 191
5011 2408 100 270 4 14 191
5012 2454 66 720 16 15 191
5013 1969 66 1000 8 14 191

As we are familiar with the structure of the computers data, we will now follow the following steps to get the predictions using the Computers dataset.

7.1 Step 1: Data Compression

For more detailed information on Data Compression please refer to section 2 of this vignette.

We will use the HVT function to compress our data while preserving essential features of the dataset. Our goal is to achieve data compression upto atleast 80%. In situations where the compression ratio does not meet the desired target, we can explore adjusting the model parameters as a potential solution. This involves making modifications to parameters such as the quantization error threshold or increasing the number of cells and then rerunning the HVT function again.

In our example we will iteratively increase the number of cells until the desired compression percentage is reached instead of increasing the quantization threshold because it may reduce the level of detail captured in the data representation

We will pass the below mentioned model parameters along with computers dataset to HVT function.

Model Parameters

set.seed(240)
hvt.results <- list()
hvt.results <- muHVT::HVT(trainComputers,   
                          n_cells = 440,
                          depth = 1,
                          quant.err = 0.2,
                          projection.scale = 10,
                          normalize = T,
                          distance_metric = "L1_Norm",
                          error_metric = "max",
                          quant_method = "kmeans",
                          diagnose = F)

Now let’s check the compression summary. The table below shows no of cells, no of cells having quantization error below threshold and percentage of cells having quantization error below threshold for each level.

compressionSummaryTable(hvt.results[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 440 358 0.81 n_cells: 440 quant.err: 0.2 distance_metric: L1_Norm error_metric: max quant_method: kmeans

As it can be seen from the table above, 81% of the cells have reached the quantization threshold error. Since we are successfully able to attain the desired compression percentage, so we will not further subdivide the cells

hvt.results[[3]] gives us detailed information about the hierarchical vector quantized data.

hvt.results[[3]][['summary']] gives a nice tabular data containing no of points, Quantization Error and the codebook.

The datatable displayed below is the summary from hvt.results

summaryTable(hvt.results[[3]]$summary)
Segment.Level Segment.Parent Segment.Child n Cell.ID Quant.Error price speed hd ram screen ads
1 1 1 5 153 0.09 -1.16 0.09 -0.08 -0.72 -0.61 -0.52
1 1 2 14 122 0.03 -0.61 -0.78 -0.68 -0.72 -0.61 0.37
1 1 3 11 370 0.27 2.58 -0.31 0.40 0.06 -0.61 -2.20
1 1 4 16 170 0.13 0.37 -0.81 -0.59 0.06 -0.61 -2.30
1 1 5 14 116 0.17 -0.76 -0.93 -0.77 -0.75 0.55 0.81
1 1 6 11 10 0.14 -1.49 -1.05 -1.18 -1.00 -0.61 1.52
1 1 7 10 78 0.05 -0.62 -1.20 -1.12 -0.72 -0.61 0.17
1 1 8 10 20 0.11 -1.35 -1.20 -1.10 -0.80 -0.61 -1.08
1 1 9 13 187 0.15 -0.31 0.09 -0.64 -0.72 0.55 0.37
1 1 10 20 369 0.24 0.33 0.92 0.23 -0.05 2.88 0.37
1 1 11 10 315 0.25 0.86 0.92 -0.30 -0.01 0.55 -1.26
1 1 12 5 167 0.09 -1.00 -0.78 0.16 -0.72 0.55 -0.45
1 1 13 10 75 0.05 -0.92 -1.20 -0.68 -0.72 -0.61 0.87
1 1 14 9 272 0.17 -0.90 0.92 0.06 0.06 0.55 -0.64
1 1 15 10 184 0.15 -0.98 0.92 -0.04 -0.72 -0.61 0.45
1 1 16 14 105 0.09 -1.50 -0.78 -0.08 -0.72 -0.61 -0.09
1 1 17 19 44 0.13 -0.09 0.92 -1.03 -0.74 -0.61 -2.29
1 1 18 7 276 0.11 0.43 0.09 0.49 0.06 -0.61 -0.56
1 1 19 6 146 0.13 -1.33 -0.92 -0.08 0.06 -0.61 -0.92
1 1 20 10 383 0.24 0.45 0.96 0.72 1.63 0.55 -0.70
1 1 21 5 359 0.04 0.56 0.92 0.82 1.63 -0.61 -0.30
1 1 22 9 380 0.09 0.79 0.92 0.82 1.63 -0.61 1.29
1 1 23 6 128 0.13 -1.09 -0.78 0.44 -0.72 -0.61 -1.11
1 1 24 15 107 0.07 -0.61 -1.20 -0.68 -0.72 -0.61 0.14
1 1 25 6 304 0.21 -0.01 2.67 -0.08 -0.20 -0.61 0.92
1 1 26 10 62 0.07 -1.25 -0.78 -0.99 -0.72 -0.61 0.94
1 1 27 9 342 0.11 0.17 2.67 0.26 0.06 -0.61 1.52
1 1 28 20 310 0.33 0.01 -0.80 -0.03 -0.09 2.88 0.17
1 1 29 10 422 0.06 1.35 -0.78 3.06 3.19 -0.61 -0.11
1 1 30 15 413 0.32 1.23 2.67 0.78 1.63 0.55 1.11
1 1 31 13 245 0.12 -0.03 -0.81 0.11 0.06 0.55 0.15
1 1 32 12 341 0.27 -0.26 0.92 -0.06 -0.39 2.88 -0.46
1 1 33 15 142 0.06 -0.07 -0.78 -0.68 -0.72 -0.61 0.15
1 1 34 17 364 0.08 1.27 0.09 0.83 1.63 -0.61 0.71
1 1 35 4 388 0.22 2.15 0.92 0.80 -0.13 0.55 -1.23
1 1 36 11 53 0.08 -1.65 -1.20 -0.66 -0.72 -0.61 0.32
1 1 37 8 45 0.09 -1.29 -1.14 -0.89 -0.72 0.55 0.84
1 1 38 14 338 0.26 1.21 0.92 0.58 0.01 0.55 0.06
1 1 39 20 385 0.08 1.44 0.92 0.83 1.63 -0.61 0.70
1 1 40 13 123 0.08 -0.45 0.09 -1.18 -1.11 -0.61 0.10
1 1 41 30 387 0.19 0.31 -0.78 1.73 1.63 0.55 -0.86
1 1 42 6 106 0.18 -0.68 -0.85 -0.66 -0.72 0.55 1.52
1 1 43 12 85 0.13 -0.44 -0.82 -0.92 -0.72 -0.61 -1.08
1 1 44 6 365 0.13 1.76 0.92 0.82 0.06 0.55 0.84
1 1 45 15 328 0.07 0.66 -0.78 0.82 1.63 -0.61 0.31
1 1 46 12 219 0.13 -0.76 -0.78 0.16 0.06 0.55 0.43
1 1 47 11 300 0.19 1.18 0.92 -0.46 -0.72 0.55 0.32
1 1 48 6 70 0.06 -1.65 -0.78 -0.69 -0.72 -0.61 0.25
1 1 49 4 425 0.05 1.17 -0.78 3.06 3.19 0.55 0.27
1 1 50 25 263 0.1 -0.34 0.92 0.30 0.06 -0.61 -0.17
1 1 51 11 281 0.13 0.09 0.92 0.30 0.06 -0.61 0.95
1 1 52 14 49 0.05 -1.02 -0.78 -1.18 -1.11 -0.61 0.80
1 1 53 6 265 0.13 0.42 -0.78 0.77 0.06 -0.61 -1.00
1 1 54 3 261 0.05 1.21 -0.78 -0.08 0.06 -0.61 0.50
1 1 55 10 358 0.25 2.13 0.92 0.73 -0.72 0.55 0.30
1 1 56 10 114 0.1 -0.22 -0.78 -1.14 -0.72 -0.61 0.73
1 1 57 5 55 0.02 -1.55 -0.78 -1.08 -0.72 -0.61 0.06
1 1 58 18 229 0.07 -0.02 -0.78 0.33 0.06 -0.61 0.35
1 1 59 5 402 0.18 1.12 0.92 -0.42 0.06 2.88 -1.19
1 1 60 31 211 0.17 0.28 0.09 -0.57 0.06 -0.61 -2.21
1 1 61 19 414 0.32 1.23 0.96 0.59 1.63 2.88 0.48
1 1 62 22 409 0.42 1.00 2.67 1.09 1.63 -0.40 -0.09
1 1 63 16 395 0.36 1.35 0.56 -0.25 1.63 -0.18 -2.26
1 1 64 4 191 0.1 -1.14 0.09 -0.08 0.06 -0.61 -0.42
1 1 65 10 93 0.06 -0.67 -1.20 -0.65 -0.72 -0.61 -0.44
1 1 66 10 83 0.05 -0.89 -0.78 -1.12 -0.72 -0.61 0.49
1 1 67 13 140 0.11 -1.00 -0.78 0.23 -0.72 -0.61 -0.34
1 1 68 14 76 0.07 -1.09 -1.20 -0.68 -0.72 -0.61 0.31
1 1 69 12 22 0.13 -1.40 -0.99 -1.18 -1.11 0.55 0.71
1 1 70 10 427 0.08 1.53 0.92 3.06 3.19 -0.61 -0.11
1 1 71 19 384 0.15 1.45 0.92 0.73 1.63 -0.61 -0.04
1 1 72 4 335 0.16 1.86 -0.13 0.45 0.06 0.55 0.26
1 1 73 7 230 0.17 -0.93 0.92 0.09 -0.72 0.55 -0.38
1 1 74 11 214 0.09 -0.36 -0.78 0.42 0.06 -0.61 0.95
1 1 75 21 381 0.17 1.26 0.92 -0.07 1.63 0.55 0.40
1 1 76 8 406 0.28 0.26 0.92 1.59 0.06 2.88 -0.66
1 1 77 8 327 0.18 -0.29 0.92 1.78 0.06 0.40 0.07
1 1 78 22 198 0.2 0.24 -0.78 -0.64 -0.04 0.55 -2.31
1 1 79 9 158 0.12 -0.38 -0.78 -0.57 0.06 -0.61 -1.41
1 1 80 9 237 0.12 -0.70 0.92 0.05 0.06 -0.61 0.12
1 1 81 9 52 0.07 -1.05 -1.20 -1.14 -0.72 -0.61 0.76
1 1 82 5 318 0.19 0.91 0.59 0.84 0.06 -0.61 -1.08
1 1 83 5 255 0.05 -0.41 0.09 0.82 0.06 -0.61 -1.30
1 1 84 18 177 0.14 0.17 0.09 -0.67 -0.72 -0.61 -0.02
1 1 85 10 2 0.07 -1.14 -0.78 -1.29 -1.11 -0.61 -2.28
1 1 86 8 17 0.1 -1.18 -0.78 -1.18 -1.11 -0.61 -1.37
1 1 87 3 440 0.04 5.26 0.92 4.01 4.76 2.88 0.46
1 1 88 15 80 0.06 -1.20 -0.78 -0.67 -0.72 -0.61 0.90
1 1 89 9 98 0.11 -1.23 0.09 -0.88 -0.72 -0.61 0.81
1 1 90 11 294 0.13 0.89 0.92 -0.15 0.06 -0.61 0.10
1 1 91 5 102 0.02 -0.79 -0.78 -0.89 -0.72 -0.61 0.07
1 1 92 17 408 0.35 2.21 0.92 0.16 -0.07 2.88 0.54
1 1 93 19 309 0.11 -0.08 0.92 0.87 0.06 0.55 0.33
1 1 94 13 57 0.06 -1.02 -0.78 -1.18 -1.11 -0.61 0.12
1 1 95 13 232 0.12 0.08 -0.88 0.41 0.06 -0.61 -0.43
1 1 96 8 355 0.12 1.36 0.92 -0.15 0.06 0.55 -2.33
1 1 97 19 251 0.12 -0.64 0.92 0.27 0.06 -0.61 -0.69
1 1 98 12 278 0.3 -0.16 2.67 -0.10 -0.52 -0.61 -0.31
1 1 99 13 289 0.09 0.15 0.09 0.33 0.06 0.55 0.38
1 1 100 12 175 0.21 -0.82 0.09 -0.43 -0.75 0.55 -0.56
1 1 101 19 436 0.41 1.99 2.67 3.06 3.11 -0.43 0.06
1 1 102 20 215 0.15 -0.39 -0.91 0.43 0.06 -0.61 1.52
1 1 103 23 125 0.09 -0.49 -0.78 -0.67 -0.72 -0.61 0.85
1 1 104 21 323 0.21 0.60 0.94 0.61 0.03 0.55 0.72
1 1 105 9 126 0.12 -1.10 0.09 -0.63 -0.72 -0.61 -0.70
1 1 106 17 86 0.06 -1.29 -0.78 -0.68 -0.72 -0.61 0.10
1 1 107 7 132 0.21 -0.74 0.45 -1.00 -0.94 0.55 -1.31
1 1 108 2 253 0.02 -0.51 -0.78 0.82 0.06 0.55 -1.30
1 1 109 6 109 0.15 -0.63 -0.92 -0.68 -0.72 0.55 -1.22
1 1 110 12 208 0.12 -0.64 0.92 0.22 -0.72 -0.61 -0.34
1 1 111 11 87 0.08 -1.52 -0.78 -0.08 -0.72 -0.61 -0.68
1 1 112 13 193 0.16 0.71 -0.78 -0.19 -0.72 -0.61 0.48
1 1 113 8 284 0.22 -0.26 2.67 -0.73 -0.82 0.55 1.02
1 1 114 8 176 0.1 -0.53 0.92 -1.18 -1.11 0.55 0.44
1 1 115 12 296 0.12 -0.37 0.92 0.40 0.06 0.55 -0.42
1 1 116 20 393 0.26 0.87 0.92 0.71 1.63 0.55 1.24
1 1 117 14 121 0.09 -0.20 -0.78 -1.12 -0.72 -0.61 0.11
1 1 118 12 77 0.06 -1.49 -0.78 -0.64 -0.72 -0.61 0.49
1 1 119 12 108 0.04 -0.94 -0.78 -0.66 -0.72 -0.61 0.09
1 1 120 14 277 0.11 -0.51 0.92 0.83 0.06 -0.61 -0.70
1 1 121 20 227 0.05 0.00 -0.78 0.33 0.06 -0.61 0.77
1 1 122 11 115 0.11 -0.20 -1.20 -0.89 -0.72 -0.61 0.50
1 1 123 11 189 0.08 -0.44 -0.78 -0.08 0.06 -0.61 0.80
1 1 124 5 185 0.09 0.18 0.92 -1.15 -0.72 -0.61 0.75
1 1 125 7 340 0.1 1.56 0.92 0.83 0.06 -0.61 1.07
1 1 126 7 188 0.12 0.72 -0.78 -0.28 -0.72 -0.61 -0.16
1 1 127 16 69 0.14 -0.86 -0.83 -0.72 -0.72 -0.61 1.52
1 1 128 26 104 0.05 -0.84 -0.78 -0.65 -0.72 -0.61 0.86
1 1 129 12 25 0.12 -1.50 -1.16 -1.15 -0.78 -0.61 -0.46
1 1 130 6 35 0.12 0.41 -0.78 -0.77 -0.72 -0.61 -2.33
1 1 131 22 131 0.12 -0.19 -0.90 -0.56 0.06 -0.61 -2.30
1 1 132 5 218 0.08 -0.85 -0.78 0.24 0.06 0.55 -0.33
1 1 133 6 280 0.16 0.22 -0.78 0.89 -0.07 0.55 0.48
1 1 134 24 386 0.13 1.20 0.92 0.45 1.63 0.55 0.43
1 1 135 18 361 0.16 1.29 0.09 0.72 1.63 -0.61 -0.01
1 1 136 8 24 0.06 -1.52 -1.20 -1.18 -1.11 -0.61 0.33
1 1 137 13 3 0.08 -1.30 -1.20 -0.94 -0.72 -0.61 -2.29
1 1 138 21 339 0.12 1.09 -0.80 0.68 1.63 -0.61 0.11
1 1 139 17 23 0.07 -0.39 -0.78 -1.01 -0.72 -0.61 -2.28
1 1 140 3 252 0.02 -0.45 -0.78 0.82 0.06 0.55 0.07
1 1 141 6 7 0.09 -0.91 0.09 -1.25 -1.11 -0.61 -2.06
1 1 142 20 270 0.11 0.46 0.92 -0.58 0.06 -0.61 -2.28
1 1 143 10 286 0.28 1.26 0.59 -0.73 -0.33 -0.61 -2.28
1 1 144 10 66 0.07 -1.55 -0.78 -0.63 -0.72 -0.61 0.87
1 1 145 9 202 0.06 -0.42 -1.20 0.33 0.06 -0.61 0.35
1 1 146 16 97 0.08 -0.92 -0.78 -0.67 -0.72 -0.61 -0.55
1 1 147 19 141 0.18 -0.64 -0.87 -0.61 -0.72 0.55 0.22
1 1 148 8 362 0.08 0.73 0.92 0.82 1.63 -0.61 0.43
1 1 149 12 92 0.05 -0.73 -0.78 -1.12 -0.72 -0.61 0.13
1 1 150 15 36 0.11 -1.42 -0.92 -0.69 -0.72 -0.61 1.52
1 1 151 8 258 0.17 0.50 0.09 -0.15 0.06 -0.61 1.03
1 1 152 11 291 0.11 0.73 0.09 -0.15 0.06 0.55 0.19
1 1 153 14 399 0.26 1.62 0.92 0.68 1.63 0.55 -0.25
1 1 154 2 250 0.01 -0.64 -0.78 0.82 0.06 0.55 -0.84
1 1 155 9 40 0.09 -0.56 1.38 -1.18 -1.11 -0.61 1.24
1 1 156 15 331 0.15 0.24 -1.03 0.82 1.63 -0.61 1.35
1 1 157 9 11 0.15 -0.74 -0.92 -0.76 -0.72 0.55 -2.35
1 1 158 20 181 0.16 -0.63 0.09 0.16 -0.72 -0.61 0.52
1 1 159 13 150 0.14 -1.05 0.92 -0.74 -0.72 -0.61 0.68
1 1 160 8 366 0.08 0.76 2.67 0.39 0.06 -0.61 1.52
1 1 161 15 382 0.27 2.45 0.92 0.67 0.06 0.55 0.28
1 1 162 18 332 0.09 0.62 -1.20 0.83 1.63 -0.61 0.74
1 1 163 7 301 0.1 0.30 0.92 0.82 0.06 -0.61 -0.69
1 1 164 17 347 0.32 0.84 -0.90 -0.13 1.63 -0.48 -2.27
1 1 165 15 321 0.18 0.12 0.95 0.55 0.06 0.55 1.45
1 1 166 16 130 0.17 -1.10 -0.81 -0.03 -0.72 -0.61 0.41
1 1 167 6 426 0.31 2.85 0.07 -0.06 1.63 2.88 -1.28
1 1 168 14 377 0.26 1.08 0.09 0.03 0.06 2.88 0.63
1 1 169 14 269 0.29 0.40 -0.87 -0.71 -0.72 2.88 1.12
1 1 170 17 8 0.28 -0.68 -0.90 -0.51 -0.72 2.88 0.88
1 1 171 18 268 0.11 0.42 0.09 0.36 0.06 -0.61 0.73
1 1 172 12 6 0.08 -0.73 -1.20 -1.02 -0.72 -0.61 -2.33
1 1 173 7 293 0.16 0.91 0.92 -0.20 -0.05 -0.61 0.88
1 1 174 11 267 0.12 0.09 0.09 -0.16 0.06 0.55 0.68
1 1 175 5 242 0.11 -0.22 -0.95 0.33 0.06 0.55 -0.32
1 1 176 8 96 0.05 -0.78 -0.78 -0.89 -0.72 -0.61 0.78
1 1 177 6 410 0.4 1.53 2.67 1.98 0.06 -0.61 -0.49
1 1 178 16 152 0.11 -0.96 0.92 -0.82 -0.72 -0.61 0.08
1 1 179 4 94 0.07 -1.64 -0.78 -0.08 -0.72 -0.61 0.60
1 1 180 11 90 0.12 -1.25 -0.93 -0.71 -0.72 0.55 0.11
1 1 181 9 297 0.21 0.68 -0.01 -0.06 0.06 0.55 -0.58
1 1 182 10 343 0.1 0.96 -0.91 0.82 1.63 -0.61 -0.44
1 1 183 13 326 0.1 0.65 -1.13 0.62 1.63 -0.61 0.09
1 1 184 7 233 0.23 0.54 -0.78 -0.14 -0.49 0.55 0.81
1 1 185 7 346 0.19 2.21 0.92 0.69 -0.16 -0.61 0.09
1 1 186 16 266 0.12 0.58 0.09 0.10 0.06 -0.61 0.16
1 1 187 10 417 0.06 1.03 -0.78 3.06 3.19 -0.61 -0.11
1 1 188 2 216 0.04 -1.30 0.09 0.90 -0.72 0.55 -1.07
1 1 189 14 357 0.13 0.40 2.67 0.51 0.06 0.55 0.44
1 1 190 7 15 0.17 -1.42 -0.96 -0.89 -0.83 0.55 1.52
1 1 191 9 144 0.12 -1.30 0.09 -0.22 -0.72 -0.61 0.29
1 1 192 3 236 0.12 -1.02 0.37 -0.08 0.06 0.55 0.08
1 1 193 10 168 0.14 -0.83 0.09 -0.60 -0.72 0.55 0.75
1 1 194 14 305 0.16 -0.42 0.92 0.67 0.06 0.55 -1.17
1 1 195 11 244 0.13 -0.16 0.09 0.30 0.06 -0.61 0.83
1 1 196 15 207 0.12 -0.43 -0.78 -0.29 0.06 0.55 0.76
1 1 197 20 194 0.08 0.03 0.92 -0.66 -0.72 -0.61 0.75
1 1 198 16 350 0.24 0.75 -0.86 0.62 1.63 0.55 -0.05
1 1 199 22 390 0.19 0.94 0.92 0.02 0.06 2.88 0.54
1 1 200 11 391 0.28 1.26 0.77 0.08 1.63 0.55 -1.07
1 1 201 3 29 0.09 -1.97 -0.78 -0.50 -1.11 -0.61 -0.16
1 1 202 3 411 0.19 1.90 0.08 0.87 0.06 2.88 -1.08
1 1 203 20 186 0.16 0.08 0.09 -0.40 -0.72 -0.61 0.71
1 1 204 4 437 0.44 2.29 0.92 3.35 1.63 2.88 -0.05
1 1 205 4 392 0.04 1.09 1.38 0.82 1.63 -0.61 1.01
1 1 206 13 398 0.11 0.52 0.09 1.73 1.63 0.55 -0.93
1 1 207 9 375 0.26 0.83 -0.20 -0.45 -0.02 2.88 -1.23
1 1 208 4 303 0.09 2.02 -0.78 0.46 0.06 -0.61 0.52
1 1 209 5 164 0.24 -1.41 -0.95 0.10 0.06 -0.61 -0.08
1 1 210 14 290 0.13 0.09 0.92 -0.20 0.06 0.55 0.76
1 1 211 13 273 0.11 -0.48 0.92 0.58 0.06 -0.61 -1.30
1 1 212 9 82 0.07 -0.94 0.09 -1.18 -1.11 -0.61 0.41
1 1 213 2 372 0.07 0.22 1.15 -0.08 1.63 0.55 1.52
1 1 214 5 16 0.15 -0.63 -0.95 -1.11 0.06 -0.61 -2.37
1 1 215 8 416 0.36 3.16 0.92 0.72 0.06 2.88 0.66
1 1 216 14 41 0.06 -1.47 -1.20 -1.14 -0.72 -0.61 0.51
1 1 217 7 71 0.06 -1.28 -0.78 -1.00 -0.72 -0.61 0.09
1 1 218 15 203 0.12 -0.49 -0.81 0.28 0.06 -0.61 -0.28
1 1 219 5 21 0.1 -0.53 0.09 -1.14 -0.80 -0.61 -2.32
1 1 220 15 403 0.27 2.81 0.92 0.52 0.06 -0.30 -2.34
1 1 221 4 352 0.1 0.20 -0.99 0.82 1.63 0.55 1.27
1 1 222 6 317 0.13 1.17 0.92 0.19 0.06 -0.61 1.35
1 1 223 11 118 0.05 -0.71 -0.78 -0.63 -0.72 -0.61 0.10
1 1 224 14 155 0.11 -0.45 0.09 -0.72 -0.72 -0.61 0.31
1 1 225 7 169 0.1 -0.32 -0.84 -0.61 0.06 -0.61 0.87
1 1 226 16 371 0.69 0.41 -0.78 3.68 0.02 -0.40 -0.09
1 1 227 11 112 0.13 -0.91 0.92 -0.87 -0.86 -0.61 1.52
1 1 228 6 394 0.3 0.64 0.92 1.95 1.63 -0.61 0.01
1 1 229 11 312 0.13 0.94 0.92 -0.12 0.06 0.55 0.02
1 1 230 16 407 0.23 1.24 0.50 -0.55 -0.23 2.88 -2.30
1 1 231 9 39 0.06 -0.13 0.09 -1.01 -0.72 -0.61 -2.29
1 1 232 8 174 0.1 -0.28 -0.78 -0.62 0.06 -0.61 -0.06
1 1 233 14 307 0.12 0.86 0.92 -0.36 0.06 0.55 0.57
1 1 234 20 156 0.14 -0.32 0.92 -1.18 -1.07 -0.61 -0.03
1 1 235 4 31 0.06 -0.72 0.92 -1.18 -1.11 -0.61 -1.37
1 1 236 15 282 0.23 0.77 0.09 -0.21 -0.25 0.55 0.78
1 1 237 5 360 0.04 0.64 0.09 0.82 1.63 -0.61 1.52
1 1 238 9 363 0.24 0.55 -0.78 0.74 0.06 2.88 0.28
1 1 239 22 275 0.13 -0.24 0.92 0.60 0.06 -0.61 0.29
1 1 240 8 65 0.06 -1.29 -1.20 -0.66 -0.72 -0.61 0.70
1 1 241 10 420 0.06 1.19 -0.78 3.06 3.19 -0.61 0.47
1 1 242 5 401 0.04 1.15 1.38 0.82 1.63 -0.61 1.52
1 1 243 6 212 0.13 -0.98 0.92 0.60 -0.72 -0.61 -0.84
1 1 244 12 354 0.31 0.33 -0.78 1.95 1.63 -0.61 0.01
1 1 245 13 432 0.16 1.49 0.09 3.06 3.19 0.55 -0.92
1 1 246 12 157 0.09 -0.32 -0.78 -0.65 -0.72 0.55 0.74
1 1 247 16 239 0.1 0.32 -0.78 0.31 0.06 -0.61 0.70
1 1 248 4 95 0.13 -0.28 -0.78 -0.45 -0.72 -0.61 -1.67
1 1 249 5 5 0.09 -1.34 -1.20 -1.01 -0.87 -0.61 -1.67
1 1 250 8 249 0.19 -0.02 0.09 0.29 -0.13 -0.61 1.52
1 1 251 9 145 0.15 0.02 -0.83 -0.66 -0.72 -0.61 0.62
1 1 252 12 33 0.06 -1.36 -0.78 -1.18 -1.11 -0.61 0.83
1 1 253 4 139 0.04 -0.42 -0.78 -0.43 -0.72 -0.61 0.07
1 1 254 9 313 0.16 0.60 0.92 -0.55 0.06 0.55 -2.12
1 1 255 8 353 0.16 0.73 2.67 0.52 0.06 -0.61 0.94
1 1 256 21 348 0.37 0.78 -0.90 0.36 1.63 -0.12 -1.36
1 1 257 10 47 0.09 -1.32 -0.78 -0.74 -0.72 -0.61 -1.26
1 1 258 4 13 0.09 -2.06 -0.78 -1.21 -1.11 -0.61 -0.19
1 1 259 19 201 0.12 0.22 0.92 -0.75 -0.72 -0.61 0.18
1 1 260 16 151 0.09 -0.37 0.92 -1.18 -1.11 -0.61 0.64
1 1 261 11 222 0.07 0.27 -0.78 -0.08 0.06 -0.61 0.08
1 1 262 8 34 0.05 -0.42 0.09 -0.89 -0.72 -0.61 -2.27
1 1 263 23 205 0.09 -0.44 -0.78 0.30 0.06 -0.61 0.42
1 1 264 12 314 0.06 0.19 -0.78 0.82 1.63 -0.61 0.44
1 1 265 6 223 0.16 1.03 0.09 -0.47 -0.72 -0.61 0.45
1 1 266 13 374 0.21 0.55 0.95 0.76 1.63 0.55 0.36
1 1 267 10 295 0.08 0.07 0.92 0.38 0.06 -0.61 1.52
1 1 268 5 54 0.07 -1.53 -0.78 -1.04 -0.72 -0.61 0.72
1 1 269 8 260 0.16 0.03 0.09 -0.35 0.06 0.55 0.10
1 1 270 15 133 0.05 -0.42 -0.78 -0.69 -0.72 -0.61 0.10
1 1 271 14 9 0.26 -0.90 -0.78 -0.18 -0.80 2.88 -0.36
1 1 272 16 333 0.09 0.68 -0.78 0.84 1.63 -0.61 0.82
1 1 273 17 138 0.1 -0.25 -0.78 -0.61 -0.72 -0.61 -0.42
1 1 274 10 124 0.2 -0.84 0.09 -1.18 -1.11 0.55 0.60
1 1 275 12 418 0.38 0.92 0.00 1.65 1.63 2.88 -0.96
1 1 276 7 200 0.08 -0.56 -1.20 0.40 0.06 -0.61 0.96
1 1 277 11 64 0.07 -1.02 0.09 -1.18 -1.11 -0.61 0.89
1 1 278 7 435 0.8 2.12 0.92 5.82 0.40 -0.12 0.43
1 1 279 8 26 0.07 -1.03 -0.78 -0.94 -0.72 -0.61 -1.67
1 1 280 7 192 0.15 -0.55 0.09 -0.51 0.06 -0.61 0.53
1 1 281 12 247 0.17 0.85 0.92 -0.42 -0.72 -0.61 -0.31
1 1 282 18 182 0.1 -0.31 0.92 -0.68 -0.72 -0.61 0.23
1 1 283 16 419 0.45 1.11 2.67 0.55 0.75 2.88 0.67
1 1 284 11 213 0.1 -0.08 -1.20 0.15 0.06 -0.61 0.10
1 1 285 7 50 0.09 -0.53 -0.90 -0.80 -0.72 -0.61 -1.67
1 1 286 5 56 0.04 -1.10 -1.20 -1.12 -0.72 -0.61 0.30
1 1 287 12 351 0.32 -0.28 0.71 2.20 0.06 0.55 -0.92
1 1 288 4 12 0.05 -2.09 -0.78 -1.21 -1.11 -0.61 0.76
1 1 289 17 137 0.21 -0.14 -1.00 -0.80 -0.72 2.88 0.36
1 1 290 4 4 0.03 -0.76 0.92 -1.29 -1.11 -0.61 -2.25
1 1 291 14 161 0.14 -0.93 -0.78 -0.02 -0.72 0.55 0.48
1 1 292 10 180 0.26 -0.61 1.10 -0.88 -0.87 0.55 1.32
1 1 293 17 433 0.18 1.67 0.92 3.06 3.19 0.55 -0.81
1 1 294 7 209 0.09 0.59 -0.78 -0.41 -0.72 0.55 -0.19
1 1 295 9 27 0.07 -1.53 -1.20 -1.15 -0.72 -0.61 0.88
1 1 296 10 257 0.14 0.24 0.92 -0.47 0.06 -0.61 -1.31
1 1 297 9 259 0.08 -0.33 0.09 0.82 0.06 -0.61 -0.74
1 1 298 5 88 0.15 -1.68 -0.78 -0.25 -0.87 0.55 0.15
1 1 299 6 271 0.22 -0.15 0.09 0.18 -0.20 0.55 1.52
1 1 300 14 48 0.15 -1.05 -0.90 -1.18 -1.08 0.55 0.25
1 1 301 12 178 0.13 -0.18 0.92 -0.73 -0.72 -0.61 -0.94
1 1 302 5 162 0.08 -0.10 0.92 -0.85 -0.72 -0.61 -1.67
1 1 303 2 165 0.03 0.21 0.09 -1.18 -0.72 -0.61 0.87
1 1 304 17 336 0.15 0.38 2.67 0.54 0.06 -0.61 0.13
1 1 305 4 415 0.11 2.53 -0.78 -0.05 1.63 2.88 0.48
1 1 306 21 306 0.12 0.33 0.92 0.31 0.06 0.55 0.37
1 1 307 20 61 0.08 -1.45 -0.78 -0.67 -0.72 -0.61 -0.74
1 1 308 12 160 0.19 -0.22 0.09 -0.66 -0.65 -0.61 -0.93
1 1 309 12 240 0.09 0.30 -0.78 0.33 0.06 -0.61 0.12
1 1 310 11 154 0.17 -0.20 -0.93 -0.82 -0.72 0.55 0.41
1 1 311 1 434 0 2.90 0.92 0.32 4.76 0.55 0.50
1 1 312 5 226 0.16 -0.80 0.59 -0.08 0.06 -0.61 1.32
1 1 313 6 72 0.11 -0.72 0.09 -1.18 -1.11 -0.61 -0.65
1 1 314 7 135 0.11 0.29 0.09 -0.74 -0.72 -0.61 -2.32
1 1 315 13 28 0.06 -1.22 -1.20 -1.18 -1.11 -0.61 0.70
1 1 316 6 30 0.13 -1.70 -0.85 -0.34 -0.72 -0.61 -1.30
1 1 317 22 279 0.27 0.51 0.09 -0.50 -0.18 0.55 -2.00
1 1 318 11 288 0.16 0.05 0.92 -0.29 0.06 0.55 0.13
1 1 319 10 337 0.17 1.33 0.92 0.17 0.06 0.55 0.98
1 1 320 7 171 0.17 -0.78 0.09 0.14 -0.72 -0.61 1.30
1 1 321 12 183 0.13 -0.22 1.26 -0.68 -0.72 -0.61 1.31
1 1 322 10 74 0.05 -0.62 -1.20 -1.12 -0.72 -0.61 0.59
1 1 323 12 287 0.32 0.72 0.50 -0.10 0.06 -0.61 -1.33
1 1 324 7 238 0.12 -0.11 0.92 -0.51 0.06 -0.61 0.31
1 1 325 9 68 0.14 -0.97 0.09 -0.90 -0.89 -0.61 1.52
1 1 326 5 163 0.14 0.04 -0.78 -0.60 -0.72 0.55 -1.36
1 1 327 4 421 0.16 2.70 2.67 1.77 0.06 0.55 -0.54
1 1 328 10 389 0.17 0.73 -0.78 -0.54 -0.09 2.88 -2.18
1 1 329 11 147 0.14 -1.20 0.92 -0.61 -0.72 -0.61 -0.60
1 1 330 12 37 0.07 -1.10 -1.20 -1.18 -1.11 -0.61 0.11
1 1 331 10 373 0.19 1.04 0.09 0.79 1.63 0.55 0.41
1 1 332 11 430 0.57 3.31 0.23 3.32 1.63 -0.61 0.36
1 1 333 3 199 0.15 -0.99 0.64 -0.28 0.06 -0.61 -1.30
1 1 334 9 101 0.14 -1.49 -0.78 -0.31 -0.76 0.55 -0.55
1 1 335 12 159 0.15 -1.18 -0.95 -0.08 0.06 -0.61 0.62
1 1 336 14 100 0.24 -0.94 -0.87 -0.77 -0.80 0.55 -0.49
1 1 337 15 73 0.07 -0.62 -0.78 -1.18 -1.11 -0.61 0.11
1 1 338 11 311 0.12 0.44 1.38 0.42 0.06 -0.61 1.29
1 1 339 6 334 0.18 2.25 0.37 0.46 0.06 -0.61 0.63
1 1 340 13 43 0.07 -1.51 -1.20 -1.08 -0.72 -0.61 0.12
1 1 341 19 246 0.14 -0.18 -0.91 0.33 0.06 0.55 0.61
1 1 342 9 103 0.07 -0.57 -1.20 -0.70 -0.72 -0.61 0.67
1 1 343 4 438 0.16 3.40 2.67 1.77 0.06 2.88 -0.54
1 1 344 18 228 0.13 -0.38 -0.85 0.82 0.06 -0.61 0.22
1 1 345 9 254 0.21 0.26 0.09 -0.57 -0.11 0.55 -1.08
1 1 346 7 274 0.14 -0.69 0.09 0.70 0.06 0.55 -0.87
1 1 347 9 319 0.13 1.26 0.92 0.48 0.06 -0.61 0.47
1 1 348 13 63 0.14 -1.52 -0.91 -0.65 -0.72 -0.61 -0.37
1 1 349 12 264 0.14 0.16 0.09 0.45 0.06 -0.61 0.35
1 1 350 12 356 0.23 0.41 2.67 0.44 0.00 0.55 -0.30
1 1 351 6 127 0.14 -0.90 -0.99 -0.85 0.06 -0.61 0.87
1 1 352 7 262 0.05 -0.35 -0.78 0.87 0.06 0.55 0.47
1 1 353 12 405 0.32 0.53 0.92 1.89 1.63 0.55 -1.11
1 1 354 15 173 0.14 0.01 -0.78 -0.08 -0.72 -0.61 0.76
1 1 355 17 59 0.12 -0.88 -0.86 -1.16 -0.95 -0.61 -0.44
1 1 356 14 119 0.08 -1.11 -0.78 -0.64 -0.72 0.55 0.59
1 1 357 17 225 0.18 -0.57 0.92 -0.29 -0.72 0.55 0.57
1 1 358 7 322 0.06 0.30 -1.20 0.82 1.63 -0.61 0.32
1 1 359 10 195 0.17 -0.43 -0.78 -0.59 0.06 0.55 0.00
1 1 360 10 38 0.06 -0.14 -0.78 -0.63 -0.72 -0.61 -2.33
1 1 361 10 111 0.12 -1.24 0.09 -0.88 -0.72 -0.61 -0.01
1 1 362 11 204 0.12 0.02 0.92 -0.47 -0.72 -0.61 -0.45
1 1 363 16 220 0.18 -0.35 0.95 0.14 -0.72 -0.61 0.40
1 1 364 23 283 0.12 -0.47 0.92 0.18 0.06 0.55 0.32
1 1 365 25 89 0.05 -1.20 -0.78 -0.68 -0.72 -0.61 0.48
1 1 366 8 302 0.23 -0.52 0.09 -0.08 -0.52 2.88 -0.21
1 1 367 16 329 0.25 0.22 -0.78 -0.01 0.06 2.88 0.93
1 1 368 28 299 0.1 0.56 0.92 0.33 0.06 -0.61 0.59
1 1 369 6 99 0.1 -0.96 0.09 -0.71 -0.72 -0.61 -1.27
1 1 370 10 345 0.11 0.71 0.09 0.82 1.63 -0.61 0.68
1 1 371 15 234 0.17 -0.59 -0.84 0.32 0.06 0.55 1.42
1 1 372 8 285 0.07 0.02 0.09 0.34 0.06 0.55 0.93
1 1 373 9 18 0.06 -1.58 -1.20 -1.18 -1.11 -0.61 0.91
1 1 374 21 376 0.17 0.34 2.67 0.46 0.06 0.55 1.45
1 1 375 8 439 0.34 1.29 -0.14 3.06 3.19 2.88 -1.07
1 1 376 9 79 0.06 -0.81 -0.78 -1.13 -0.72 -0.61 0.82
1 1 377 23 344 0.11 1.09 -0.80 0.83 1.63 -0.61 0.68
1 1 378 10 166 0.11 -0.60 -0.82 0.36 -0.72 -0.61 0.51
1 1 379 21 400 0.14 1.65 0.92 0.68 1.63 0.55 0.66
1 1 380 2 197 0.1 1.06 -0.35 -0.55 -0.72 -0.61 -1.08
1 1 381 9 143 0.13 -0.90 0.92 -0.69 -0.72 -0.61 -1.12
1 1 382 17 81 0.1 -0.79 -0.83 -0.65 -0.72 -0.61 -1.12
1 1 383 24 179 0.23 -0.12 2.67 -0.87 -0.86 -0.61 1.13
1 1 384 13 243 0.19 0.28 0.95 0.08 -0.72 -0.61 0.70
1 1 385 16 231 0.21 0.11 -0.83 -0.36 0.06 0.55 -1.37
1 1 386 11 292 0.13 -0.51 0.92 0.14 0.06 0.55 1.34
1 1 387 13 330 0.31 0.11 0.16 -0.13 -0.24 2.88 0.53
1 1 388 18 19 0.08 -0.80 -0.78 -0.93 -0.72 -0.61 -2.30
1 1 389 17 217 0.1 -0.14 -1.20 0.36 0.06 -0.61 0.71
1 1 390 3 32 0.01 -0.46 -0.78 -0.50 -0.72 -0.61 -2.35
1 1 391 12 298 0.16 0.55 0.92 0.26 0.06 -0.61 -0.19
1 1 392 6 1 0.1 -1.48 -1.13 -1.29 -1.11 -0.61 -2.29
1 1 393 11 14 0.21 -1.25 -0.90 -1.03 -0.93 0.55 -1.30
1 1 394 12 190 0.2 -0.73 0.92 -0.69 -0.75 0.55 -0.36
1 1 395 7 308 0.08 0.61 0.92 0.85 0.06 -0.61 0.47
1 1 396 9 325 0.17 1.28 0.92 -0.50 -0.72 0.55 -2.24
1 1 397 16 428 0.11 1.37 -0.78 3.06 3.19 0.55 -0.61
1 1 398 3 256 0.02 -0.51 -0.78 0.82 0.06 0.55 -0.62
1 1 399 20 349 0.31 0.64 -0.91 0.74 1.63 0.55 0.62
1 1 400 19 224 0.15 -0.52 -0.80 0.82 0.06 -0.61 -0.71
1 1 401 13 320 0.12 0.29 0.92 0.72 0.06 0.55 -0.39
1 1 402 10 58 0.06 -1.16 -1.20 -1.03 -0.72 -0.61 0.06
1 1 403 9 397 0.23 0.59 0.82 0.37 0.06 2.88 1.41
1 1 404 6 429 0.17 1.57 0.92 3.06 3.19 -0.42 0.47
1 1 405 6 396 0.17 2.16 0.92 0.73 1.63 -0.61 0.43
1 1 406 13 324 0.24 1.07 -0.81 -0.47 -0.72 2.88 0.12
1 1 407 18 235 0.15 0.03 0.92 -0.59 -0.72 0.55 0.49
1 1 408 8 120 0.1 -1.10 -0.78 -0.03 -0.72 -0.61 0.98
1 1 409 15 148 0.08 -0.56 0.09 -0.71 -0.72 -0.61 0.83
1 1 410 10 51 0.06 -1.32 -1.20 -0.78 -0.72 -0.61 0.92
1 1 411 14 424 0.52 1.53 1.29 3.20 1.63 0.55 0.46
1 1 412 13 117 0.09 -0.82 0.92 -1.18 -1.11 -0.61 0.77
1 1 413 8 379 0.15 1.21 0.50 0.46 1.63 -0.61 -1.37
1 1 414 12 248 0.09 0.14 -0.78 0.85 0.06 -0.61 0.48
1 1 415 11 84 0.06 -0.89 -1.20 -0.73 -0.72 -0.61 0.52
1 1 416 7 42 0.1 -1.66 -1.20 -0.83 -0.72 -0.61 0.72
1 1 417 7 149 0.1 -0.80 -0.90 -0.66 0.06 -0.61 0.12
1 1 418 8 221 0.06 -0.50 -0.78 0.82 0.06 -0.61 -1.30
1 1 419 13 46 0.06 -1.29 -0.78 -1.18 -1.11 -0.61 0.39
1 1 420 8 136 0.11 -0.16 0.09 -0.77 -0.72 -0.61 -1.67
1 1 421 11 67 0.09 -0.73 -0.82 -1.18 -1.11 -0.61 0.56
1 1 422 26 423 0.31 2.54 0.92 0.26 1.63 2.88 0.30
1 1 423 6 404 0.17 0.98 0.92 1.73 1.63 0.55 -0.50
1 1 424 14 172 0.08 -0.48 0.92 -0.73 -0.72 -0.61 0.90
1 1 425 4 60 0.02 -0.68 -0.78 -1.18 -1.11 -0.61 0.87
1 1 426 9 206 0.11 0.40 -0.78 -0.53 0.06 -0.61 0.27
1 1 427 12 316 0.11 0.23 -0.82 0.82 1.63 -0.61 -0.32
1 1 428 11 196 0.17 -0.84 0.09 0.03 -0.72 0.55 0.48
1 1 429 16 129 0.05 -0.41 -0.78 -0.68 -0.72 -0.61 0.57
1 1 430 10 91 0.22 -1.18 -0.82 -0.08 -0.56 -0.61 1.52
1 1 431 12 378 0.47 0.20 0.71 3.15 0.06 -0.42 -0.47
1 1 432 7 367 0.11 2.27 0.92 0.55 0.06 -0.61 -1.42
1 1 433 6 241 0.15 0.04 0.92 -0.65 -0.59 0.55 -1.07
1 1 434 15 431 0.1 1.19 -0.78 3.06 3.19 0.55 -1.15
1 1 435 12 368 0.29 1.22 0.44 -0.45 -0.72 2.88 0.77
1 1 436 15 210 0.17 0.22 -0.89 -0.11 0.06 -0.61 -1.39
1 1 437 10 113 0.06 -0.49 0.09 -1.18 -1.11 -0.61 0.70
1 1 438 24 110 0.06 -0.86 -0.78 -0.68 -0.72 -0.61 0.47
1 1 439 6 134 0.14 -1.58 -0.78 0.27 -0.72 0.55 -1.07
1 1 440 9 412 0.09 1.46 2.67 0.82 1.63 -0.61 1.29

Now let us understand what each column in the above summary table means:

All the columns after this will contain centroids for each cell. They can also be called a codebook, which represents a collection of all centroids or codewords.

7.2 Step 2: Data Projection

For more detailed information on Data Projection please refer to section 3 of this vignette.

lets view the projected 2D centroids after performing sammon’s projection on the compressed data recieved after performing vector quantization. For the shake of brevity we are displaying first six rows.


hvt_torus_coordinates <-hvt.results[[2]][[1]][["1"]]
centroids <<- list()
  coordinates_value <- lapply(1:length(hvt_torus_coordinates), function(x){
    centroids <-hvt_torus_coordinates[[x]]
    coordinates <- centroids$pt
  })
centroid_coordinates<<- do.call(rbind.data.frame, coordinates_value)  
colnames(centroid_coordinates) <- c("x_coord","y_coord")
centroid_coordinates$Row.No <- as.numeric(row.names(centroid_coordinates)) 
centroid_coordinates <- centroid_coordinates %>% dplyr::select(Row.No,x_coord,y_coord)
centroid_coordinates <- centroid_coordinates %>% data.frame() %>% round(4)
Table(head(centroid_coordinates))
Row.No x_coord y_coord
1 -12.7717 -2.8475
2 -14.2963 4.0788
3 14.9524 -23.1304
4 -4.7877 -22.8443
5 -13.6585 5.9424
6 -23.6167 15.2525

Lets visualize the projected Sammons 2D onto a plane.

# Assuming your sammons_data is a dataframe with columns "x" and "y"
ggplot(centroid_coordinates, aes(x_coord, y_coord)) +
  geom_point(color = "blue") +
  labs(x = "X", y = "Y")
Figure 14: Sammons 2D Plot for 440 cells

Figure 14: Sammons 2D Plot for 440 cells

7.3 Step 3: Tessellation

For more detailed information on voronoi tessellation please refer to section 4 of this vignette.

Now, we have obtained the centroid coordinates resulting from the application of Sammon’s projection.

For better visualisation, let’s plot the Voronoi tessellation using the plotHVT function.

# Voronoi tessellation plot for level one

 muHVT::plotHVT(hvt.results,
        line.width = c(0.2), 
        color.vec = c("#141B41"),
        centroid.size = 0.01,  #1.5
        maxDepth = 1)
Figure 15: The Voronoi Tessellation for layer 1 shown for the 440 cells in the dataset ’computers’

Figure 15: The Voronoi Tessellation for layer 1 shown for the 440 cells in the dataset ’computers’

Heat Maps

Now let’s plot the Voronoi Tessellation with the heatmap overlaid for all the features in the computers dataset for better visualization.

The heatmaps displayed below provides a visual representation of the spatial characteristics of the computers data, allowing us to observe patterns and trends in the distribution of each of the features (n,price,speed,hd,ram,screen,ads). The sheer green shades highlight regions with higher values in each of the heatmaps, while the indigo shades indicate areas with the lowest values in each of the heatmaps. By analyzing these heatmaps, we can gain insights into the variations and relationships between each of these features within the computers data


muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "n",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = T,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15
)
Figure 16: The Voronoi Tessellation with the heat map overlaid over the No. of entities in each cell in the ’computers’ dataset

Figure 16: The Voronoi Tessellation with the heat map overlaid over the No. of entities in each cell in the ’computers’ dataset


muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "price",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = T,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15
)
Figure 17: The Voronoi Tessellation with the heat map overlaid over the variable price in the ’computers’ dataset

Figure 17: The Voronoi Tessellation with the heat map overlaid over the variable price in the ’computers’ dataset


muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "hd",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = T,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15
)
Figure 18: The Voronoi Tessellation with the heat map overlaid over the variable hd in the ’computers’ dataset

Figure 18: The Voronoi Tessellation with the heat map overlaid over the variable hd in the ’computers’ dataset

muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "ram",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = T,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15
)
Figure 19: The Voronoi Tessellation with the heat map overlaid over the variable ram in the ’computers’ dataset

Figure 19: The Voronoi Tessellation with the heat map overlaid over the variable ram in the ’computers’ dataset

muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "screen",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = T,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15
)
Figure 20: The Voronoi Tessellation with the heat map overlaid over the variable screen in the ’computers’ dataset

Figure 20: The Voronoi Tessellation with the heat map overlaid over the variable screen in the ’computers’ dataset


muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "ads",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = T,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15
)
Figure 21: The Voronoi Tessellation with the heat map overlaid over the variable ads in the ’computers’ dataset

Figure 21: The Voronoi Tessellation with the heat map overlaid over the variable ads in the ’computers’ dataset

7.4 Step 4: Prediction(predictHVT)

For more detailed information on prediction please refer to section 5 of this vignette.

Raw Testing Dataset

Now, lets have a look at the randomly selected raw testing dataset containing (1252 data points) before we pass it to predictHVT function for scoring. For the shake of brevity we are displaying first six rows.

Table(head(testComputers_data))
Row.No price speed hd ram screen ads
5008 1540 33 214 4 15 191
5009 3094 50 1000 24 15 191
5010 1794 50 214 4 14 191
5011 2408 100 270 4 14 191
5012 2454 66 720 16 15 191
5013 1969 66 1000 8 14 191

Now once we have built the model, let us try to predict using our test dataset which cell and which level each point belongs to.

predictHVT(data,
                  hvt.results,
                  hmap.cols = NULL,
                  child.level = 1,
                  ...)

The important parameters for the function predictHVT are as below

set.seed(240)
predictions <- muHVT::predictHVT(
  testComputers,
  hvt.results,
  child.level = 1,
  line.width = c(1.2),
  color.vec = c("#141B41"),
  quant.error.hmap = 0.2,
  n_cells.hmap = 440,
  normalize = TRUE
)

Let’s see which cell and level each point belongs to and check the mean absolute difference. For the sake of brevity, we will only show the first 10 rows

summary_list <- hvt.results[[3]]
  
  train_colnames <- names(summary_list[["nodes.clust"]][[1]][[1]])
  scaled_test_data <- scale(
     testComputers[, train_colnames],
      center = summary_list$scale_summary$mean_data[train_colnames],
      scale = summary_list$scale_summary$std_data[train_colnames])
testComputers <- scaled_test_data
data1 <- data.frame(testComputers)
data1$Row.No <- row.names(testComputers)
data1 <- data1 %>% dplyr::select(Row.No,price,speed,hd,ram,screen,ads)
colnames(data1) <- c("Row.No","price_act","speed_act","hd_act","ram_act","screen_act","ads_act")
data2 <- predictions[["scoredPredictedData"]]
data2 <- data2 %>% dplyr::select(Cell.ID,price,speed,hd,ram,screen,ads)
colnames(data2) <- c("Cell.ID","price_pred","speed_pred","hd_pred","ram_pred","screen_pred","ads_pred")
combined <- cbind(data1,data2)
combined$diff <- rowMeans(abs(combined[, c("price_act","speed_act","hd_act","ram_act","screen_act","ads_act")] - combined[, c("price_pred","speed_pred","hd_pred","ram_pred","screen_pred","ads_pred")]))
rownames(combined) <- NULL
options(scipen = 999)
combined %>% head(100) %>%
  as.data.frame() %>%
  Table(scroll = T, limit = 10)
Row.No price_act speed_act hd_act ram_act screen_act ads_act Cell.ID price_pred speed_pred hd_pred ram_pred screen_pred ads_pred diff
5008 -1.2287446 -0.7832055 -0.6759793 -0.7181490 0.5490304 -0.8403063 100 -1.2287 -0.7832 -0.6760 -0.7181 0.5490 -0.8403 0.0000261
5009 1.3847942 0.0921775 3.0630691 3.1928493 0.5490304 -0.8403063 432 1.3848 0.0922 3.0631 3.1928 0.5490 -0.8403 0.0000242
5010 -0.8015639 0.0921775 -0.6759793 -0.7181490 -0.6148117 -0.8403063 126 -0.8016 0.0922 -0.6760 -0.7181 -0.6148 -0.8403 0.0000244
5011 0.2310699 2.6668336 -0.4095840 -0.7181490 -0.6148117 -0.8403063 278 0.2311 2.6668 -0.4096 -0.7181 -0.6148 -0.8403 0.0000244
5012 0.3084333 0.9160675 1.7310925 1.6284500 0.5490304 -0.8403063 405 0.3084 0.9161 1.7311 1.6285 0.5490 -0.8403 0.0000267
5013 -0.5072464 0.9160675 3.0630691 0.0640507 -0.6148117 -0.8403063 378 -0.5072 0.9161 3.0631 0.0641 -0.6148 -0.8403 0.0000295
5014 1.0652496 0.0921775 3.0630691 3.1928493 0.5490304 -0.8403063 432 1.0652 0.0922 3.0631 3.1928 0.5490 -0.8403 0.0000315
5015 -1.2203355 0.9160675 -0.0765899 0.0640507 -0.6148117 -0.8403063 251 -1.2203 0.9161 -0.0766 0.0641 -0.6148 -0.8403 0.0000243
5016 -0.9293817 0.9160675 -0.0765899 -0.7181490 -0.6148117 -0.8403063 212 -0.9294 0.9161 -0.0766 -0.7181 -0.6148 -0.8403 0.0000213
5017 -1.1211085 -0.7832055 -0.6759793 -0.7181490 -0.6148117 -0.8403063 61 -1.1211 -0.7832 -0.6760 -0.7181 -0.6148 -0.8403 0.0000170
hist(combined$diff, breaks = 20, col = "blue", main = "Mean Absolute Difference", xlab = "Difference")
Figure 22: Mean Absolute Difference

Figure 22: Mean Absolute Difference

We can see the predictions for the points in the table above.The centroid of the cell that the point is mapped to is the codeword (predictor) for that cell.

8 Executive Summary

9 Applications

  1. Pricing Segmentation - The package can be used to discover groups of similar customers based on the customer spend pattern and understand price sensitivity of customers

  2. Market Segmentation - The package can be helpful in market segmentation where we have to identify micro and macro segments. The method used in this package can do both kinds of segmentation in one go

  3. Anomaly Detection - This method can help us categorize system behavior over time and help us find anomaly when there are changes in the system. For e.g. Finding fraudulent claims in healthcare insurance

  4. The package can help us understand the underlying structure of the data. Suppose we want to analyze a curved surface such as sphere or vase, we can approximate it by a lot of small low-order polygons in the form of tessellations using this package

  5. In biology, Voronoi diagrams are used to model a number of different biological structures, including cells and bone microarchitecture

  6. Using the base idea of Systems Dynamics, these diagrams can also be used to depict customer state changes over a period of time

10 References

  1. Topology Preserving Maps : https://users.ics.aalto.fi/jhollmen/dippa/node9.html

  2. Vector Quantization : https://en.wikipedia.org/wiki/Vector_quantization

  3. K-means : https://en.wikipedia.org/wiki/K-means_clustering

  4. Sammon’s Projection : http://en.wikipedia.org/wiki/Sammon_mapping

  5. Voronoi Tessellations : http://en.wikipedia.org/wiki/Centroidal_Voronoi_tessellation

  6. Embedding : https://en.wikipedia.org/wiki/Embedding